Skip to content

Add multilingual MATH-500 (mmath500) task#5

Merged
dzautner merged 4 commits into
mainfrom
daniel/translate-prompts
Mar 26, 2026
Merged

Add multilingual MATH-500 (mmath500) task#5
dzautner merged 4 commits into
mainfrom
daniel/translate-prompts

Conversation

@dzautner
Copy link
Copy Markdown

Summary

  • Add multilingual MATH-500 Finnish task (mmath500:fi) with model-graded scoring via inspect-ai
  • Fix scorer model to use env vars (SCORER_MODEL_BASE_URL, SCORER_MODEL_PATH) instead of hardcoded vLLM init at module import time
  • Translate mmath500 prompt template to Finnish

Test plan

  • Ran mmath500:fi on TW cluster across multiple checkpoints
  • Confirmed scorer model (Qwen3.5-9B) loads correctly via env vars

Daniel Zautner added 3 commits March 25, 2026 12:53
Uses LumiOpen/MATH-500_mt dataset with Qwen3.5-9B (reasoning disabled) as scorer.
Read SCORER_MODEL_BASE_URL/SCORER_MODEL_PATH from env to connect to an
existing scorer server started by the eval harness. Falls back to using
the eval model (like original math_500) when no scorer server is set up.
Finnish and Danish prompts reviewed by native speakers (Kai, Maria).
@MariaBarrett335
Copy link
Copy Markdown

Kai said the disabled reasoning from the scorer model, ca we set that as the default?
Other than that, it looks good to me

Pass enable_thinking=False via extra_body to the scorer model so it
doesn't waste tokens on chain-of-thought when grading answers.
@MariaBarrett335
Copy link
Copy Markdown

looks good to me

@MariaBarrett335
Copy link
Copy Markdown

sorry, accidentally git the close with comment

@dzautner dzautner merged commit 352d4ce into main Mar 26, 2026
7 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants